The Di culty of Reduced Error Pruning ofLeveled Branching
نویسنده
چکیده
Induction of decision trees is one of the most successful approaches to supervised machine learning. Branching programs are a generalization of decision trees and, by the boosting analysis, exponentially more eeciently learnable than decision trees. In experiments this advantage has not been seen to materialize. Decision trees are easy to simplify using pruning. For branching programs no such algorithms are known. We prove that reduced error pruning of branching programs is infeasible. Finding the optimal pruning of a branching program with respect to a set of pruning examples that is separate from the set of training examples is NP-complete. Therefore, we are forced to consider approximate solutions to this problem. We also prove that nding an approximate solution of arbitrary accuracy is computationally intractable. In particular, the reduced error pruning of branching programs is APX-hard.
منابع مشابه
Learning Small Trees and Graphs that Generalize
In this Thesis we study issues related to learning small tree and graph formed classifiers. First, we study reduced error pruning of decision trees and branching programs. We analyze the behavior of a reduced error pruning algorithm for decision trees under various probabilistic assumptions on the pruning data. As a result we get, e.g., new upper bounds for the probability of replacing a tree t...
متن کاملAn Analysis of Reduced Error Pruning
Top-down induction of decision trees has been observed to su er from the inadequate functioning of the pruning phase. In particular, it is known that the size of the resulting tree grows linearly with the sample size, even though the accuracy of the tree does not improve. Reduced Error Pruning is an algorithm that has been used as a representative technique in attempts to explain the problems o...
متن کاملReduced Error Pruning of branching programs cannot be approximated to within a logarithmic factor
In this paper, we prove under a plausible complexity hypothesis that Reduced Error Pruning of branching programs is hard to approximate within log1−δ n, for every δ > 0, where n is the number of description variables, a measure of the problem’s complexity. The result holds under the assumption that NP problems do not admit deterministic, slightly superpolynomial time algorithms: NP ⊂ TIME(|I |O...
متن کاملAppears in Ecml-98 as a Research Note Pruning Decision Trees with Misclassiication Costs 1 Pruning Decision Trees
We describe an experimental study of pruning methods for decision tree classi ers when the goal is minimizing loss rather than error. In addition to two common methods for error minimization, CART's cost-complexity pruning and C4.5's error-based pruning, we study the extension of cost-complexity pruning to loss and one pruning variant based on the Laplace correction. We perform an empirical com...
متن کاملPruning Decision Trees with Misclassi cation Costs 08 - FEB - 1998
We describe an experimental study of pruning methods for decision tree classi ers in two learning situations: minimizing loss and probability estimation. In addition to the two most common methods for error minimization, CART's cost-complexity pruning and C4.5's errorbased pruning, we study the extension of cost-complexity pruning to loss and two pruning variants based on Laplace corrections. W...
متن کامل